December 9, 2019

A Policy Challenge

Overview

Problem Definition

  • A third of the approximately 80,000 NYC 8th graders applying to high school sit for the SHSAT.
  • Admitted students represent ~20% of test takers (5,000 students).
  • Admitted students attend 45 “feeder” middle schools ~20% of the schools.
  • Specialized high schools are unbalanced demographically.
  • The biggest imbalances are with admitted Black/Latino, ELL and Economically disadvantaged students.

Research Purpose

  • Provide PASSNYC specific recommendations on increase SHSAT takers in underperforming school districts.
  • Address factors that improve SHSAT success at students in underperforming districts.

Feeder School Concentration

Specialized High School Acceptances 2019

Graph of Acceptances by Zipcode

Feeder Schools

Feeder Schools

Brooklyn Zipcode 11204

Feeder Schools Detail

Feeder Schools Detail

What is PASSNYC?

Not-for-profit to address underserved NYC DOE students in the reputable Specialized High Schools process.

PASSNYC

How can PASSNYC address the Diversification of NYC Specialized High Schools using Data Science?

  • Sponsors a Kaggle Competition for Recommender System Solutions.
  • Goal is to close the diversity gap at 8 NYC Specialized High Schools.
  • Recommendations focus is on reaching the underserved students and preparing them for taking the SHSAT.

Data

Exploratory Data Analysis

  • NYC DOE & PASSNYC provide an extensive dataset to research this project?

Economic Need vs. Income

Economic Need by Ethnicity

Academic Rigor by Ethnicity

Achievement By Cluster

Proportion of students that take the SHSAT

Common Core Test Timeseries - Grades 3-8

Models

PASSNYC Excerpt 1

4 Classifiers

  • Random Forest Classifier (RF)
  • Linear Support Vector Classification (LSVC)
  • Gaussian Naive Bayes (GNB)
  • Linear Discriminant Analysis (LDA)
Models

Models

Metrics

  • Multi-class problem - uses average F1 scores for each class
  • Takes into account label imbalances
  • Computes confusion matrix over all classes, and then computes precision
  • Recall, and f-score uses these counts

Ranking Predictors of SHSAT Success

Features

Features

PASSNYC Excerpt 2

Data Analysis

  • Least Sent vs. Most Students sent to Specialized High Schools

Recommendation 1:

Increase the number of SHSAT takers by focusing on schools with good average academics and few takers

SHSAT participation is highly correlated to the academic performance. 33 schools with good or average academics in underperforming districts were identified using Linear Regression and add approximately 20 candidates per school yielding at least 660 SHSAT candidates.

Recommendation 2:

Increase SHSAT pass rates by focusing on schools with many “Level4 students” and few offers

There are 472 schools that received 0-5 offers per school. Regression models show high correlation between SHSAT success and Level4 scores on Common Core exams. These 33 schools are estimated to yield 408 additional SHSAT candidates in predominatly Black/Hispanic schools which would move the needle on the demographic balance somewhat.

Recommendation 3:

Focus on bright students in underperforming schools with low SHSAT participation and low academics.

The common denominator seems that these schools all have really low performing academics and a culture that avoids the SHSAT. At least 35 such schools had at least one Level4 result. Focusing resources on these schools would better prepare students with critical reading and math skills should yield additional Specialized High School seats.

PASSNYC Excerpt 3

Classify and Recommend

  • Identify Underrepresented Schools
  • K-Means Clustering, 3 groups of likely admission to SPHS
  • Recommendation of Intervetions

Gender, Ethnicity & Need Distributions

Features

Features

FeaturesFeatures

Underrepresentation

Model

Elbow method based on within-cluster sum of squares determines that the optimal number of cluster is 3. Using 20 features listed above, k-means algorithm partitions 472 middle schools with non-demographic data available into 3 clusters with different levels of academic performance as shown in the following scatterplot.

KNN-Model

KNN-Model

Recommendation 1:

On-campus intervention at 5 schools in Cluster A representing middle schools most likely to have students qualified for SPHS.

Recommendation 2:

Awareness campaign at 48 schools in Cluster B to boost awareness about SHSAT and SPHS at 48 middle schools in Cluster B:

Recommendation 3:

Regional information sessions and workshops at 3 locations for all schools Middle schools in top 25% of the Underrepresentation Score cluster around three locations: Harlem, Bronx and Brooklyn (Braodway Junction), neighborhoods have a high proportion of Black and Hispanic residents.

Awareness: Organize regional information sessions in Harlem, Bronx and Brooklyn (Broadway Junction) to direct parents and students to resources available. Organize regional test preparation workshops in these neighborhoods.

PASSNYC Excerpt 4

Decision Tree Classifier

Recommendation:

Why decision tree workflow? To explain the results business people and partner organizations

Whenever one has to explain a difficult statistical model to stakeholders, one can use decision trees. Decision trees basically unfolds the complete process of decision making and series of decision to reach to the conclusion. Here we will have to focus on “yes” class, as it means yes that particular school is underperforming. This is one of the simplest methods and most convinient methods when it comes to explain the statistical model to partners (or even to layman population)

Conclusion

Summary of Analysis and Recommendations

  • The PASSNYC competition accentuates the underserved schools in the city by geographic, demographic, economically and over time. The “Recommendor” systems results show schools grouped into 3 categories roughly equating to 1) Students with high academic scores that don’t take the SHSAT 2) Students with average academic scores where focussed intruction could yield more students eligible for the Specialized High Schools. 3) Students with below average academic scores where the objective is academic intervention to bring students up to standard.

  • Outreach to students in the first group which are academically prepared is a second order problem requiring awareness and test preparation. The second and third groups require more in-depth academic intervention and preparation during grade school to bring the majority of students up to academic standards and provide opportunities for the top students among these underserved groups to also prepare for the SHSAT.

  • To the extent that such interventions occur, the demographic balance of the Specialized High Schools will become more aligned with the city’s demographics and these schools will retain their merit-based selectivity.

References

https://www.kaggle.com/passnyc/data-science-for-good/discussion/63311 https://alexromero.shinyapps.io/PASSNYC https://towardsdatascience.com/data-science-takes-on-public-education-f432910ea9f0 https://patch.com/new-york/new-york-city/just-45-middle-schools-gave-specialized-high-schools-60-percent-their https://www.nytimes.com/interactive/2018/06/29/nyregion/nyc-high-schools-middle-schools-shsat-students.html http://jonathansoma.com/lede/algorithms-2017/classes/networks/networkx-graphs-from-source-target-dataframe/ https://programminghistorian.org/en/lessons/exploring-and-analyzing-network-data-with-python